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1 .  Introduction 

The  purpose  of  this  paper  is  to  consolidate  and  extend  recent  advances 
in  the  use  of  mathematical  programming  duality  theory  in  discrete  optimization. 
In  particular,  meaningful  dual  programs  have  been  identified  for  the  integer 
programming  problem  (Fisher  and  Shapiro  [11]),  the  resource  constrained  network 
scheduling  problem  (Fisher  [10])  and  the  traveling  salesman  problem  (Held  and 
Karp  [22],  [23]).  The  solution  techniques  developed  to  exploit  the  structure 
of  these  dual  problems  are  similar  and  may  be  applied  to  a  general  class  of 
discrete  optimization  problems.  The  general  problem  we  consider  is:  Find 

v(b)  =  min  f(x) 

s.t.  g(x)  <  b  (1.1) 

xeX 
where  X  C R^  is  a  finite  set,  f(x)  a  real  valued  function  and  g(x)  a  function 
with  values  in  R'".  We  call  problem  (1.1)  the  primal  problem. 

The  dual  problem  is  constructed  in  the  usual  fashion  (Geoffrion  [13]). 

For  u  >^  0,  define  the  Lagrangean  function 

L(u)  =-  ub  +  min{f(x)  +  ug(x)}. 
xeX 

It  is  well  known  and  easily  shown  that  L(u)  <   v(b)  and  the  dual  problem  is  to 

find  the  best  lower  bound;  namely,  find 

w(b)  =  max  L(u) 

(1.2) 
s.t.  u  >^  0  . 

By  construction,  w(b)  _<  v(b)   and  with  the  non-convex  structure  of  our  primal 

problem  there  can  be  no  guarantee  that  w(b)   =  v(b). 

It  is  often  difficult  to  solve  (1.1)   directly  and  some  search  of  the 

finite  set  X  is  required.     Our  purpose  in  constructing  and  solving  one  or  more 

dual   problems   is   to  make  this  search  efficient  by  exploiting  the   following 

properties   of  the  duals s   (1)   They  are  easier  to  solve  than  the  primal,    (2)   They 
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provide  lower  bounds  to  the  primal  problem  objective  function  and  those  of  any 
primal  subproblems  derived  during  the  search  and  (3)  they  may  yield  optimal  or 
good  feasible  solutions  to  the  primal. 

The  plan  of  this  paper  is  the  following.  Section  2  contains  four  specific 
applications  of  the  discrete  duality  theory  being  studied  here.  The  following 
section  is  devoted  to  a  brief  review  of  the  properties  of  dual  problems  that 
we  use  in  the  construction  of  algorithms  for  solving  them.  Section  3  also  con- 
tains a  brief  discussion  of  the  many  dual  problems  in  addition  to  (1.2)  which 
can  be  constructed  from  a  given  primal  problem.  Section  4  contains  a  general 
branch  and  bound  algorithm  for  solving  the  primal  problem  which  uses  the  dual 
problem  (1.2). 

The  following  three  sections  are  devoted  to  methods  for  solving  a  given 
dual  problem.  Section  5  contains  a  primal-dual  ascent  algorithm  for  the  dual 
problem.  Section  6  contains  an  approximate  search  algorithm  based  on  the 
primal -dual  algorithm  of  section  5  and  Scarfs s  method  of  simplicial  approxima- 
tion (Scarf  [30],  Hansen  and  Scarf  [21]).  The  nature  of  Scarf's  approximation 
can  be  given  a  precise  interpretation  in  this  context,  and  sufficient  conditions 
that  the  approximation  is  exact  are  presented. 

The  following  section  contains  a  brief  discussion  of  subgradient  relaxation 
methods  for  solving  the  dual  problem  which  have  been  studied  by  Held  and  Karp 
[23]  and  Held,  Karp  and  Wolfe  [24]. 
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2.  Examples  of  Discrete  Dual  Problems 

In  this  section  we  discuss  several  well-known  problems  for  which  the  dual 

problem  in  Section  1  may  be  constructed.  These  problems  will  illustrate  the 

type  of  imbedded  special  structures  which  give  rise  to  a  set  X  for  which  the 

problem  min  f(x)  +  ug(x)  is  comparatively  easy  to  solve.  Minor  modifications 

xeX 
to  the  structure  of  (1.1)  are  needed  for  some  of  these  applications,  and  the 

corresponding  modifications  to  the  dual  will  be  indicated. 

Four  problems  will  be  discussed:  The  traveling  salesman  problem  (Held 
and  Karp  [22,  23]),  the  general  integer  programming  (IP)  problem  (Fisher  and 
Shapiro  [11]),  the  resource-constrained  network  scheduling  problem  (Fisher  [10]), 
and  the  capacitated  lot  size  problem  (Dzielinski  and  Gomory  [4],  Fisher  [9]). 
The  references  are  to  descriptions  of  dual  algorithms  developed  for  these  prob- 
lems. Other  problems,  too  numerous  to  discuss  in  detail,  to  which  the  dual 
approach  may  be  applied  include  the  cutting  stock  problem  (Gilmore  and  Gomory 
[14]),  network  synthesis  problems  (Gomory  and  Hu  [15])  and  the  problem  of 
scheduling  elective  hospital  admissions  (Fisher  and  Schrage  [12]). 

2 . 1  The  Symmetric  Traveling  Salesman  Problem  (Held  and  Karp  [22],  [23]) 

Given  a  k  x  k  matrix  of  transition  costs  c.  .  with  c .  .  =  c  . .  we  would 
like  to  find  the  least  cost  tour  of  k  cities  for  a  traveling  salesman.  A 
tour  is  a  path  between  the  cities  in  which  each  city  is  visited  once  and  only 
once. 

The  k  cities  and  the  paths  between  them  form  an  undirected  graph  with  k 

/ 1,;  ] 
nodes  and  p  arcs.  Any  subgraph  of  this  graph  may  be  represented  by  the 

0-1  variables  x .  .,  i  =  1 ,  . . . ,  j  -1 ,  j  =  2,  . . . ,  k,  where  x .  ,  =  1  means  that 

the  arc  (ij)  is  present  in  the  subgraph.  We  let  x  =  (x  ,  x  ,  x  ,  ...,  x    ) 

denote  a  vector  of  these  variables. 
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Fon owing  Held  and  Karp  [22]  we  define  a  1-tree  as  a  subgraph  formed  by 
adding  any  two  arcs  incident  on  node  1   to  a  spanning  tree  for  nodes  2,    ...,   k. 
A  spanning  tree  for  nodes  2,    ...,   k  is  a  eyeless  subgraph  with  at  least  one 
arc  indicdent  to  every  node  2,    ...,   k. 

Note  that  every  tour  is  a  1-tree  but  that  the  reverse  is  not  true.      If 
we  let  v.(x)   denote  the  number  of  arcs  incident  to  node  j   in  the  subgraph 
represented  by  x,  then  the  tours  are  exactly  the  1-trees  which  satisfy  the 
side  constraints 

v^.(x)   =  2,  j  =  2,    ...,   k  -   1 
The  incidence  for  node  1   is  2  by  the  definition  of  a  1-tree,  while  if  nodes 
1   to  k  -  1   have  incidence  2,  then  incidence  of  2  for  node  k  is  implied. 

We  may  represent  the  problem  of  finding  a  minimum  cost  tour  in  the  form 
of  problem  (1.1)  with     g(x)   =  b  by  letting 

X  =   {x|the  sub-graph  defined  by  x  is  a  1-tree}, 

f(x)   -         C^  c  .  .X.  ., 


l<i<j<k 


ij  ij 


g(x)   =   (v^lx),   v^(x),    ...,   v^__^(x)), 

and  b  is  a  k-2  vector  of  2's. 

The  problem  min  f(x)  +  ug(x)  reduces  to  finding  a  minimum  spanning  tree  in  a 

xeX 
graph,  a  problem  relatively  easier  than  the  traveling  salesman  problem.  Since 

the  inequalities  in  (1.1)  have  been  replaced  by  equalities  for  this  applica- 
tion, the  dual  variables  in  (1.2)  should  be  unconstrained  in  sign. 
2.2  The  Integer  Programming  (IP)  Problem  (Gorry  and  Shapiro  [16]) 
We  begin  with  an  IP  problem  written  in  the  form 
min  cw 

Aw  =  d  (2.1) 

w  >^  0  and  integer 
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where  A  is  an  n  x(n  +  m)  integer  matrix  of  rank  m,d  is  an  m  x  1  integer 
vector,  and  c  is  a  1  x(n  +  m)  integer  vector.  Let  B  be  any  m  x  m  nonsingular 
matrix  formed  from  the  columns  of  A  and  N  the  matrix  of  remaining  columns. 
We  partition  w  into  y  and  x,  with  y  corresponding  to  the  columns  in  B,  and  in 
the  same  way  partition  c  into  c  and  c  .  Then  (2.1)  may  be  written  as 

min  z  =  c  B~^d  +  c  -  c  B~"^N)x 

B  N  B 

s.t.  B"-^Nx  +  y  =  B'-^d 

x,y  >^  0  and  integer. 
The  correspondence  with  (1.1)  is  obtained  by  letting 
X  =  {x|x  >_  0  and  integer,  y  integer}. 


)--Z, 


f(x)  =  (c^  -  c^B-^N)x, 


g(x)  =  B"^N  X,  and 
b  =  B"-^d. 

The  problem  min  f(x)  +  ug(x)  is  equivalent  to  finding  the  shortest  route  in 

xeX 
a  special  network  and  Mevy   efficient  algorithms  have  been  developed  for  the 

problem  (e.g.,  see  Gorry  and  Shapiro  [18]). 

It  may  be  shown  that  if  any  component  of  c  -  c  B~^N  +  uB"-^N  for  u  >  0 

N  B  — 

is  less  than  zero  then  L(u)   =  -  oo.     Hence,   it  is  useful   to  require 
S  "  S^     N  +  uB     N  ^  0  when  we  solve  the  dual.     The  modifications  required 
to  include  these  linear  side  constraints  in  the  analysis  of  the  dual   problem 
are  straight-forward  (Fisher  and  Shapiro  [11]). 


2.3  The  Resource-Constrained  Network  Scheduling  Problem  (Fisher  [10]) 

This  problem  concerns  the  scheduling  of  I  jobs  to  be  performed  over  T 
time  periods.  Each  job  consists  of  n.  tasks  and  the  time  to  process  task  ij 
is  a  given  integer  p.  ..  The  tasks  of  each  job  are  constrained  by  a  precedence 
network  defined  by  a  set  of  tasks  B.  .  wnich  must  be  completed  before  task  ij 
is  started.  K  different  resources  are  available,  with  b   denoting  the  amount 
of  resource  k  available  in  period  t.  r.  .,  is  the  amount  of  resource  k  required 

ijk 

by  task  ij  while  in  process.  If  x.  .  denotes  the  variable  start  time  of  task 
ij  and  x  =  (x,,'  ^  ,  ...,  x   )  we  would  like  to  select  a  value  for  x  such 
that  some  function  is  optimized.  For  the  sake  of  concreteness,  assume  we  wish 
to  minimize  the  sum  of  the  tardiness  of  each  job  relative  to  a  due  date  d^. 

The  correspondence  with  problem  (1.1)  is  given  by 

X  =  {xlO  <  X.  .  <  T  -  p.  .,  X.  .  integer,  and  x.  .  >  x.„  +  p..,   £  e  B.  .}, 

'    —   JJ  —       ^2J     IJ  ^  IJ    —       ll  IX.  IJ 

I 

f(x)  =  Z  max(0,  max  (x.  .  +  p.  .  -  d.)), 
i=l       j    ^^    '^ 

g,  (x)  =    Z    r.  .,  where 
^^     ijcJ(t)  '^^ 

J(t)  =  {ijlx.  .  +  1  <  t  <  x.  .  +  p,  .},  and 

'  ij    -     -    ij      '^ij 

g(x)  -  (g_^_^(x),  g^^^^)'  •••'  V^^^^' 
X  is  the  set  of  integer  start  times  which  satisfy  the  precedence  con- 
straints. The  restriction  to  x.  .  integer  is  allowed  because  the  integrality 
of  p .  .  may  be  used  to  show  that  at  least  one  optimal  solution  will  be  integer. 
Letting  u_  denote  the  multipler  on  the  constraint  g, .(x)  £  b  ,  the  problem 

Jet  KU-KV. 

min  f(x)  +  ug(x)  is  a  CPM  problem  in  which  tasks  are  charged  a  price  u   for 

XKl 

using  resource  k  during  period  t.  An  efficient  algorithm  for  this  problem 
is  given  in  [10].  This  algorithm  uses  the  separability  of  the  network  con- 
straints to  facilitate  the  solution  of  min  f(x)  +  ug(x). 

xeX 
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2.4  The  Capacitated  Lot  Size  Problem  (Dziellnski  and  Gomory  [41,  Fisher  [9]) 

We  are  given  a  problem  in  which  I  items  are  to  be  produced  over  T  periods, 
with  a  non-negative  integer  amount  d   demanded  of  item  i  in  period  t.  K 
resources  are  available  in  an  amount  b   for  resource  k  in  period  t.  We  seek 
to  determine  integer  production  amounts  x^^  for  each  item  in  each  period. 
The  amount  of  resource  k  used  when  we  produce  x^^  is  given  by 

/   \     0,  X,  =  0 

where  a   represents  setup  requirements.  Direct  per  unit  production  costs 

ik 

for  each  item  are  given  by  c .  .  Since  demand  is  fixed,  any  inventory  holding 
costs  may  be  included  in  c  . 


The  correspondence  with  (1.1)  is 


^  "  ^^11'  ^12'   •••'  V^ 

T 


{xlx   >  0,  integer,  and  Z  x.^  >  Z  d   for  all  i  and  t} 
It  -  ^,-,  It       ^^^     It 


f(x)  =  I   c.^x.^ 


g(x)  =  {g^_j(x),  g^2^^^'  •••'  s^t^^^^- 


The  problem  min  f(x)  +  ug(x)  is  separable  by  item.  For  any  particular  item, 

xeX 
the  problem  may  be  solved  using  the  dynamic  programming  or  shortest  route 

algorithm  of  Wagner  and  Whitin  [32]. 
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3.   Properties  of  Dual  Problem 

In  this  section,  we  review  the  principal  properties  of  the  dual  problem 

which  will  be  used  in  the  remainder  of  the  paper.   Let  the  finite  set 

t  T 
X  =  {x  }   ,,  and  for  notatlonal  convenience,  let 

t"l 


L(u,  x^)  =  -ub  +  f{x^)   +   ug(x^) 


=  f(x^)  +  uy"^ 


where  y   =  g(x  )  -  b.   Then 


L(u)  =  minimum   L(u,  x  ) 


t=l,...,T 


It  is  well  known  and  easily  shov/n  that  L(u)  is  a  continuous  concave  func- 
tion defined  on  R  .   Let 


Q  =  {u|u>_0  and  L(u)  =  L(u,  x*^)}; 


it  can  easily  be  shown  that  Q   is  a  (possibly  empty)  closed  convex  set. 

With  this  background,  tlie  dual  problem  (1.2)  can  be  rewritten  as  the 
linear  programming  problem:   Find 

w(b)  =  max  w 

s.t.  w  <_  -ub  +  f  (x*^)  +  ug(x'^)      t=l,...,T  (3.1) 

u  >_  0. 

Ttiis  LP  formulation  of  tlie  dual  problem  immediately  gives  us  some  of  its 
properties.   The  function  w(b)  can  be  shown  to  be  convex  and  w(b)  <  +  " 
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if  and  only  if  {x|g(x)  ^  b]  /^   [X]  ^^  (J)  where  [X]  denotes  the  convex  hull  of  X. 

It  will  be  convenient  below  to  be  able  to  compactify  the  set  of  feasible 

u  being  considered  in  (3.1).   The  following  lemma  gives  sufficient  conditions 
under  which  this  can  be  done. 

r  r 

Lemma  3.1.   Suppose  the  point  x   e  X  is  an  interior  point,  i.e.  g.(x  )  <  b., 

m 
i=l,...,m.   Then  without  loss  of  optimality  the  constraint  ^     u .  _^  U  can  be 


i=l   ^ 


added  to  (3.1)  where 


f(x'')  -  w 
U  =  — 


min      (b.-g. (x  )) 
i=l , . . . ,m 


and  w  is  a  lower  bound  on  w(b)  (e.g.,  w  =  L(u)  for  some  u  >^  0) . 

Proof:   Let  u*,  w*  =  w(b)  by  any  optimal  solution  in  (3.1).   We  have 

m 

E  u*(b^-g^(x^))  <   f(x^)  -  w*  <  f (x^)  -  w 
i=l 

since  w*  >_  w  .   Division  by  min      (b.-g.(x  ))  >  0  gives  us 

i=l, . . . ,m 

m        m         (b.-g. (x^))  f (x  )  -  w 

E   u*  1  I  u* — <_ . 

i=l      i=l    min      (b.-g.(x  ))    min      (b.-g.(x  )) 
i=l, .  . .  ,m  i=^l> .  .  .  ,m 

m 
Thus,  the  addition  of  the  constraint   Z  u.  ^  U  does  not  cut  off  any  optimal 

i=l   ^ 


solution  to  (3.1)  and  the  lemma  is  proven. 
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Unfortunately,  the  LP  problem  (3.1)  cannot  he   efficiently  solved  as 
an  ordinary  LP  problem  because  the  number  of  rows  T  is  quite  large  in  most 
applications.   Moreover,  only  a  small  number  of  the  rov/s  will  be  binding 
at  an  optimal  solution.   One  method  of  solution  of  (3.1)  is  the  generalized 
programming  algorithm  of  Dautzij^  and  Wolfe  ([3;  Chapter  24]);  otherwise 
known  as  the  dual  cutting  plane  algorithm  (Zangrvill  [3A]).   In  addition  to 
its  poor  convergence  characteristics,  this  algorithm  is  not  satisfactory 
for  problem  (3.1)  because  it  does  not  provide  monotonically  increasing 
lower  bounds  for  the  branch  and  bound  algorithm  of  the  next  section. 

The  function  L(u)  is  not  dif f erentiable  everywhere  hut  directional 
derivatives  exist  in  all  directions  at  any  point  u  e  R  .   Positive  direc- 
tions are  the  ones  to  use  in  ascent  algorithms.   In  order  to  make  these 
ideas  operational,  we  need  some  definitions.   An  m-vector  y    is  called  a 
subgradiei^t  of  L  at  u  if 

L(u)  <_  L(u)  +  (u  -  u)y  for  all  u. 

Any  subgradient  of  L  at  u  points  into  the  half  space  containing  all  optimal 
solutions  to  the  dual  problem;  namely,  {u|(u-u)y  ^  0)  contains  all  optimal 

dual  solutions.   Thus,  it  may  seem  plausible  to  construct  ascent  algorittims 

k+1    k      k        k 
for  the  dual  problem  of  the  form  u    =  u  '  +  6,  y  v/here  y      is  a  subgradient 

of  L  at  u   and  0   satisfies  L(u  +  0  y  )  =  n^ax  L(u  +  Oy  ) .   The  difficulty 

e>_o 

is,  however,  that  6,  may  be  zero  but  u   is  not  optimal  and  L  increases  in 
the  direction  of  another  subgradient.   To  put  this  difficulty  in  its 
correct  perspective  and  to  see  how  it  can  be  overcome,  we  must  discuss 
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briefly  the  characterization  of  all  subgradients  at  a  point. 
Let 


T(u)    =  {tlLCH)  =  L(u,  x*^)}; 


t      t  -  — 

then  Y   =  g(x  )  -  b  for  any  t  e  T(u)  is  a  subgradient  of  L  at  u  and  the 

collection  3L(u)  of  all  subgradients  of  L  at  u  is  given  by 


9L(u)  =  (yIy  =  j:    a  y*^,  j:    a  =  l,  X  ^  O,  ¥t  e  T(G)}. 
teT(u)  ^  teT(u)  ^  ^ 


With  this  background,  we  can  give  a  necessary  and  sufficient  condition  in 
terms  of  the  set  3L(u)  that  u  is  optimal  in  the  dual  problem.   This  condi- 
tion is  the  basis  for  our  primal-dual  asceiit  algorithm  of  section  five. 

Consider  a  dual  solution  u  _>  0  which  we  wish  to  test  for  optimality 
in  the  dual.   Let 


Ku)    =   {i|G.  =  0} 


c  — 
and  let  I  (u)  denote  the  complementary  set.   The  optimality  test  is  de- 
rived from  the  LP  problem 

"   -  + 

a*  =  min   E   s  +  Z     s 

i=l         c  - 
iel  (u) 

s.t.   Z    X  Y.  -  sT  +  s   =0,  i=l,...,m,  (3.2) 


teT(u)  ^ 


^t  —  ^»  ^  ^  T(u),  sT  >_  0,  s.  _  0,  i=l m. 
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Let  \*   denote  the  optimal  weights  in  (3,2), 

Lemma  3.2,   The  dual  solution  u  _>  0  is  optimal  in  the  dual  problem  (3,1) 
(equivalently  (1.2))  if  and  only  if  a*  =  0  in  the  LP  problem  (3.2). 

Proof:   (Sufficiency)   If  a*  =  0  in  (3,2),  then  the  subgradient 


a*  =  Z  _  A*Y*^  (3,3) 

teT(u) 

satisfies  y*  <_0,    i  e  I(u),  y^  =  0,  i  e  I^(u).   Thus,  for  all  u  >_  0 


L(u)  _<  L(u)  +  (u-u)y* 


=  L(u)  +  I     (u  -u  )y* 
iel(G)     ^ 


=  L(u)  +  E     u.Y* 

±   L(G), 

The  first  inequality  follows  beciiuse  y*  is  a  subgradient,  the  first  equality 
because  y?  =  0»  i  e  I  (u) ,  the  second  equality  because  u,  =0,  i  e  I(u) 
and  the  second  inequality  because  Y?  ^  0»  i  e  I(u)  and  u.  >_  0,   This 
establishes  sufficiency. 

(Necessity)   We  prove  necessity  by  showing  that  a*  >    0  in  (3.2) 
implies  u  is  not  optimal.   To  do  this,  we  need  to  consider  the  dual  LP 
problem  to  (3,2) 
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o*  =  max  V 


s.t.  V  _<  VY  t  e  T(u) 

c  -  (3-^> 

-  1  £  V  <  1        i  e  I  (u) 

0_<v._<l         iciCu) 

V  unconstrained  in  sign 


Let  V*,  V*  =  a*  denote  an  optimal  solution  to  (3.4)  and  suppose  v*  >  0. 
This  implies  that  v*y   >  0  for  all  t  e  T(u)  and  therefore  that  v*  i^   0. 
Consider  now  the  problem 

max  L(u  +  Ov*)  (3.5) 

6  _>  0 

If  the  maximum  of  L(u)  along  the  half  line  u  +  Gv*  for  9  >_  0  is  greater 
than  L(u)  or  if  L(u)  is  unbounded  along  the  half  line,  then  clearly  u  is 
not  optimal  in  the  dual  problem  and  the  lemma  is  proven. 
Suppose  then  that  for  all  6  >_  0 

L(u  +  ev*)  _<  L(u)  (3.6) 

We  will  show  a  contradiction.   For  6  >  0,  we  have 


L(u  +  ev*)  =  min       f(x'^)  +  uy^  +  ev*Y^ 
t=l,...,T 

t    -  t       t 
=  min    f(x  )  +  UY  +  6v*y 

where  the  second  equality  follows  from  (3.6)  and  the  fact  that  v*y   >  0 


^a>^ 
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-  tx,  + 

for  all  t  e  T(u).   Consider  a  sequence  19,  },_,  such  that  6  ->■  0  .   Since 

the  number  of  x  is  finite,  we  can  assume  without  loss  of  generality  that 

s        * 
there  is  an  x  ,  s  ^  T(u),  satisfying  for  all  k 

L(u  +  Q^v*)    =  f(x^)  +  UY^  +  ej^v*Y^  (3.7) 


Taking  limits  on  both  sides  of  (3.7)  and  using  the  continuity  of  L,  we 
have 

L(G)  =  f(x^)  +  uY^ 

or  s  e  T(u),  a  contradiction.   This  proves  necessity  and  the  leirana. 

Lemma  3.2  gives  necessary  and  sufficient  conditions  that  a  vector 
u  ^  0  is  optimal  in  the  dual  problem  (3.1).   It  can  easily  be  shown  that 
X  e  X  satisfying  g(x)  _<  b  is  optimal  in  tiie  primal  problem  (1.1)  if 
L(u)  =  L(u,  x)  and  u(g(x)  -  b)  =  0.   There  is  no  guarantee,  hov/ever,  that 
such  an  x  exists.   Even  if  one  does,  it  might  still  be  quite  difficult 
to  find  it  since  the  number  of  x  e  X  satisfying  L(u)=  L(u,  x)inaay  be 
large.   In  the  next  section,  we  shov;  how  the  dual  problem  can  be  used 
in  a  "fail-safe"  manner  as  an  analytic  tool  in  a  general  branch  and 
bound  algorithm. 

Although  we  have  stated  only  one  dual  problem  (1.2)  and  its  LP 
equivalent  (3.1),  there  are  in  fact  many  possible  duals  of  the  primal 
problem  (1.1).   A  common  problem  manipulation  which  can  be  combined  with 
duality  to  form  many  duals  is  relaxation.   A  relaxation  of  (1.1)  is  a 
problem  of  the  following  form 
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v'(b)  =  min  f(x) 

s.t.  g^(x)  lb.,  i  e  r,  (3  8) 

X  e  X' 

■where  I'  C{l,...,m},  X  C  X' .   Clearly,  v(b)  >  v' (b)  and  a  dual  problem 
of  the  form  (1.2)  with  maximum  value  w' (b)  constructed  from  (3.8)  bears 
the  same  relationship  to  (1.1)  as  does  (1.2)  (i.e.  v(b)  >  v' (b)  >  w'(b)). 
There  is,  of  course,  an  increased  likelihood  that  v(b)  >  w' (b) ,  or  more 
generally  that  v(b)  -  w' (b)  >  v(b)  -  w(b)  >  0,  but  the  larger  duality  gap 
may  be  more  than  compensated  by  the  shorter  time  it  takes  to  calculate 
w'(b).   A  more  complex  form  of  relaxation,  called  data  relaxation,  has 
been  found  for  integer  programming  [17].   This  type  of  relaxation  consists 
of  a  change  in  the  functions  g.(x)  -  b.  so  that  subsequent  Lagrangean 
minimizations  can  be  more  efficiently  accomplished.   We  remark  that  it  is 
possible  to  use  duality  theory  to  try  to  minimize  or  reduce  the  error 
introduced  by  the  data  change  (see  j    4  of  [17]). 

Another  generalization  of  the  dual  problem  (1.2)  due  to  Gould  [21]  is 
to  replace  the  multiplier  vector  u  by  a  multiplier  function  u(.)  satisfying 
certain  monotonicity  properties.   The  Lagrangean  minimization  becomes 

L(u)  =  min{f(x)  +  u(g(x)-b)}. 
xeX 

The  usefulness  of  this  approach  for  discrete  optimization  needs  to  be 

carefully  studied  because  it  appears  difficult  to  solve  the  network  prob- 

whlch   ,ire   the  Lagrangean   minimizations    of 
lems     /    section   2  with  nonlinear   costs.      The   specific   exponential  multiplier 

functions  of  Evans  and  Gould  [7]  are  the  ones  to  try  in  this  context. 
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Finally,  we  mention  briefly  that  the  dual  methods  here  have  been 
widely  used  to  decompose  large  optimization  problems  with  block  diagonal 
or  staircase  structures  (e.g.,  Grinold  [19],  Lasdon  [26]).   We  will  not 
discuss  dual  decomposition  here  but  simply  remark  that  our  methods  are 
applicable  to  decomposable  discrete  problems  with  such  structures. 


-17- 


4.   Branch  and  Bound  Algorithm  for  Discrete  Primal  Problem 

As  we  mentioned  in  the  introduction,  the  nonconvexity  of  the  set  X 
makes  it  possible,  and  for  some  problems  even  likely,  that  there  exist 
a  duality  gap.   Nevertheless,  dual  problems  can  be  usefully  exploited 
in  a  branch  and  bound  algorithm  which  searches  over  the  finite  set  X  for 
an  optimal  solution  to  the  primal.   Our  approach  bears  a  resemblance  to 
the  branch  and  bound  algorithm  of  Falk  and  Soland  [8]  for  a  different 
class  of  nonconvex  programming  problems. 

The  search  of  the  set  X  is  done  in  a  non-redundant  and  implicitly 
exhaustive  fashion.   At  any  stage  of  computation,  the  least  cost  known 
solution  X  e  X  satisfying  g(x)  _<  b  is  called  the  incumbent  with  incumbent 
cost  z  =  f(x).   The  branch  and  bound  algorithm  generates  a  series  of 
subproblems  of  the  form 


v(b;  X**^)  =  min  f(x) 


s.t.  g(x)  <_  b  (4.1) 

„^ 
X  e  X 

where  X  C  X.   If  we  can  find  an  optimal  solution  to  (4.1),  then  we  have 

implicitly  tested  all  subproblems  of  the  form  (4.1)  with  X  replaced  by 

i.         k 
X  CX  and  such  subproblems  do  not  have  to  be  explicitly  enumerated. 

k 
The  same  conclusion  is  true  if  we  can  ascertain  that  v(b;  X  )  >^  z  without 

actually  discovering  the  precise  value  of  v(b;  X  ),   If  either  of  these 

two  cases  obtain,  then  we  say  that  the  subproblem  (4.1)  has  been 

fathomed.   If  it  is  not  fathomed,  then  we  separate  (4.1)  into  new  sub- 

k  a 

problems  of  the  form  (4.1)  with  X  replaced  by  X  ,  J!,=1,...,L,  and 


L 

U 


u  x"-  =  x*^,  X  -"n  X  "^  =  4,,  Jl^  5^  z^. 
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We  attempt  to  fathom  (4.1)  by  solution  of  the  dual  problem 


w(b;  X  )  =  max  w 


s.t.  w  _<  -  ub  +  f  (x^)  +  ug(x*^)      t  E  T^  (4.2) 

u  >  0 


k  t    k 

where  T  C.  T  is  the  index  set  of  the  x  e  X  ,   The  use  of  (4.2)  in 

analysing  (4.1)  is  illustrated  in  figure  4.1  which  we  will  now  discuss 

step  by  step. 


STEPS  1  and  2:   Often  the  initial  subproblem  list  consists  of  only  one 
subproblem  corresponding  to  X. 

STEP  3:   A  good  starting  dual  solution  u  >_  0  is  usually  available  from 
previous  computations. 

STEP  4:   Computing  the  Lagrangean  is  a  netv;ork  optimization;  shortest 
route  computation  for  integer  programming,  minimum  spanning  tree 
for  the  traveling  salesman  problem,  dynamic  programming  shortest 
route  computation  for  the  capacitated  lot  size  problems,  etc. 

-  k  k     • 
STEP  5:   As  a  result  of  step  4,  the  lower  bound  L(u;  X  )  on  v(b;  X  ) 

is  available,  where 

L(u;  X^)  =  -  Gb  +  min  {f(x^)  +  ug(x*^)}  (4.3) 

teT 

-  k 

It  should  be  clear  that  (4.1)  is  fathomed  if  L(u;  X  )  >_  z  since 

L(JI;  X^)  <_  v(b;  X^) . 

s       k 
STEPS  6,  7,  8:   Let  x  ,  s  e  T  ,  be  an  optimal  solution  in  (4.3)  and 

s  s 

suppose  X   is  feasible;  i.e.  g(;."  )  _<_  b.   Since  (4.3)  \ias   not  fathoned 

(step  5),  TJc  have  L(u,  X  ^)  =  f(x')  +  u(g(x ')-'.))  <  z,  with  the  quantity 

—     s  s     '^ 

u(g(x")-b)  <_  0.   Thus,  it  may  or  may  not  be  true  that  f(x  )  <  z,  but 

if  so,  then  the  incumbent  x  should  be  replaced  by  x'.   In  any  case, 

g 

if  x   is  feasible,  we  have  by  duality  theory 
f  (x*")  +  G(g(x^)-b)  <.  v(b;  x'')  £  f  (x^) 
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s  —    s 

and  it  should  be  clear  that  x   is  optimal  if  u(g(x  )-b)  =  0;  i.e., 

if  complementary  slackness  holds. 

STEPS  9,  10:   These  steps  are  concerned  with  the  functioning  of  one  or 
more  algorithms  on  the  dual  problem.   For  example,  step  9  may  be  a 
test  for  optimality  an  the  dual  of  the  current  u.   Alternatively, 
it  may  be  a  test  of  recent  improvement  in  the  dual  lower  bound. 

STEPS  11,  12:   The  separation  of  the  given  subproblem  can  often  be  done 
on  the  basis  of  information  provided  by  the  dual  problem.   For  example, 
in  integer  programming,  problem  (4.3)  may  be  separated  into  two 
descendants  with  a  zero-one  variable  x  set  to  zero  nnd  one  respectively, 

where  x   is  chosen  so  that  the  reduced  cost  f(x.)  +  ug(x  )  is  minimal 

for  u  optimal.   It  is  important  to  point  out  that  the  greatest  lower 
bound  obtained  during  the  dual  analysis  of  (4.3)  remains  a  valid 

a        k 

lower  bound  on  a  subproblem  derived  from  (413)  with  X    C^  X   .      In  step 
12,  the  new  subproblem  selected  from  the  subproblem  list  should  be 
one  with  low  or  minimal  lower  bound. 
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5.   Primal-Dual  Ascent  Algorithm  for  Dual  Problem 

In  this  section  we  give  a  primal-dual  ascent  algorithm  for  solving 
the  dual  problem  in  the  LP  form  (3,1).   The  algorithm  we  propose  is  an 
adaptation  of  the  primal-dual  simplex  algorithm  applied  to  (3.1)  where 
the  large  number  of  rows  are  handled  implicitly.   Our  algorithm  bears  a 
close  resemblance  to  the  algorithm  of  Grinold  in  [20]  for  large  scale 
linear  programming  and  also  to  the  algorithm  of  Bradley  in  [2]. 

It  was  demonstrated  in  lemma  3.2  that  the  point  u  >^  0  is  optimal  in 
(3,1)  if  and  only  if  there  exists  a  convex  combination  y*   of  the  sub- 
gradients  Y  .  t  e  T(u),  such  that  y?  =  0  if  u  >  0  and  y?  i.  0  if  u  =  0. 
The  algorithm  proceeds  by  generating  successive  y  »  t  e  T(u)  until  either 
u  is  proven  optimal  in  the  dual  by  the  test  of  lemma  3.1  or  a  new  u  j'  u 
Is  found  such  that  L(u)  >  L(u). 

Thus,  we  consider  a  generic  step  of  the  algorithm  beginning  at  a 
point  u  >_  0  and  a  set  T'(u)  Ct(u).   Recall  that  I(u)  =  {i|u  =0}.   We 
solve  the  following  LP  problem  which  is  simply  problem  (3.2)  with  a  re- 
stricted number  of  columns: 

a*   =  min  Z  s  +  I     s 

i=l      .  ^c,-, 
lel  (u) 

s.t,   E      X  y  -  s.  +  s   =  0     i=l,...,m  (5.1) 

ttl'iu)      ^   ^  ^    ^ 

IX  =1 

teT'(G)  ^ 

X  >_  0,  t  e  T'(u),  sT  >_  0,  s   >_  0,  i=l m 

It  is  convenient  to  write  the  dual  of  (5.1): 
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max  V 


s.t.  u  <_  VY*^         t  e  T'(u)  (5.2a) 

-l£Vi£l     ie  l'^(u)  (5.2b) 

0  1  Vj^  _<  1       i  e  I(u)  (5.2c) 


Let  o*   be  the  optimal  solution  value  in  (5.1)  and  (5.2),  and  let  X*, 

t  e  T'(u),  V*  be  optimal  values  of  the  primal  and  dual  variables.   It 

should  be  clear  that  the  direction  v*  produced  by  (5.2)  is  somewhat 

arbitrary;  for  example,  the  ranges  on  the  v   in  (5.2b)  and  (5.2c)  except 

V   >  *^  ^"  (5.2c)  could  be  adjusted  to  reflect  differences  in  scaling  on 

different  rows.   The  reader  is  referred  to  Nemhauser  and  Widhelm  [28]  for 

more  discussion  of  this  point. 

By  lemma  3.2,  u  is  optimal  in  the  dual  if  a*   =  0.   We  describe  a 

generic  step  of  the  algorithm  for  the  case  a*   >   0.   Notice  from  (5.2) 

tliat   for   at    least   one   t   e   T'(u),   we  must   have  v*y      =  a*   >   0.      Thus,   V7e 

have  found  a  subgradient  y*  =      Z     X*y      and  a  direction  v*  5*  U  such 

tET'(u)  ^ 

that  an  increase  in  the  Lagrangean  function  is  indicated  in  the  direction 
V*  namely,  for  6  _>_  0, 

L(u  +  ev*)  <_  L(u)  +  ev*Y*  =  L(u)  +  eo*  (5.3) 

The  primal-dual  algorithm  proceeds  by  finding  the  maximal  value  of  0 
such  that  equality  holds  in  (5.3). 
Let 


V(u)  =  {i|v*  <  0}; 
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note  that  V(u)  £l  (G) .   The  search  for  a  maximal  9  with  the  desired 

property  is  confined  to  the  interval  [0,  e    1  where 

max 


This  interval  is  the  one  in  which  u  remains  non-negative.   It  is 

possible,  of  course,  that  V(G)  =  <1>,  and  no  upper  bound  on  9  is  available. 

As  long  as  w(b)  <  +  »,  this  does  not  cause  any  real  difficulty  because 

we  can  easily  devise  a  finite  search  of  the  half  line  G  +  9v*,  for 

9  >,  0,  for  the  maximum  of  L  along  it.   Alternatively,  we  could  add  a 

m 
constraint  Z  u^  _<  U  to  the  dual  problem.   A  valid  U  can  be  derived  if 

an  interior  point  x  e  X  exists  (see  lemma  3.1).  In  any 

case,  an  arbitrary  upper  bound  on  U  can  be  selected.   Tlie  only  inviolate 

use  we  make  of  the  dual  problem  in  the  algorithm  of  section  4  for  solving 

the  given  primal  problem  is  to  provide  lower  bounds.   The  addition  of 
n 

a  constraint  Z  u  ^  U  to  the  dual  (3.1)  does  not  affect  this  property. 
i=l  t-     t-        J 

Henceforth,  we  shall  assume  :hat  the  above  difficulty  is  handled  by 

m 
the  Imposition  of  the  constraint  Z  u.  £  U.   Let  9*  e  [0.  9    1  denote 

the  maximal  value  of  9  such  that  L(G  +  9v*)  =  L(u)  +  9a*.   The  procedure 

for  finding  the  maximum  is  the  following.   Let  9,  =  9    and  at  iteration 

1    max 

k,    compute  L(u  +  Q^v*)    =  L(JI.  x^)  for  some  x*"  e   X.   Notice  that 

L(u  +  9j^v*)  =  L(u.  x^)  +  9^v*Y^.   If  L(G  +  9j^v*)  =  L(u)  +  9j^v*y'^,  then 

Q^  =   9*  and   tlie   search   terminates.      If 

L(u  +   0|^v*)    <   L(u)   +   Bj^v^Y*,  (5.4) 
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then  compute 


The  denominator  in  this  fraction  is  positive  because  of  (5.4)  and  the 


-   k      - 
fact  that  L(u,  x  )  ^1  L(u).   If  9    =  0,  then  set  6*  =  0  and  the  search 


terninates. 

We  show  now  that  (i)  the  solution  of  the  LP  (5.1)  defined  at 
u  +  6''^*  can  begin  with  the  optimal  basis  for  (5.1)  at  u  and  (ii)  the 
value  of  the  objective  function  at  u  +  e"v*  will  be  (barring  r^-Heneracy) 
less  than  a*.      Convergence  of  the  primal-dual  algorithm  is  thereby 
assured  by  the  usual  simplex  criterion  (see  Simonnard  [32;  pp.  128-134]), 
There  are  three  cases  to  consider. 


Case  one:   0  <  9*  <  9 

max 

In  this  case,  we  vrant  to  show  that  the  optimal  basic  variables  from 
(5.1)  at  u  remain  in  the  problem  at  u  +  9*v*  and  with  the  same  objective 
function  coefficients.   First,  for  t  e  T'(u)  such  that  X   is  in  the 
optimal  basis  for  (5.1),  we  have  by  complementary  slackness  (see  (5.2)) 
that  v*Y   =  o*.   This  implies  that  t  e  T(u  +  e*v*)  since  by  construction, 
L(u  +  e*v*)  =  L(u  +  9*v*,  x*^)  =  L(u)  +  e*v*Y*^  =  L(u)  +  9*0*. 

To  complete  the  argument,  we  need  to  consider  the  s   and  s  variables. 

They  clearly  remain  in  (5.1)  at  u  +  9*v*  and  can  be  in  a  starting  feasible 

basis.   We  must  simply  show  that  the  objective  is  unchanged  for  the  s; 

that  are  basic.   For  i  e  I  (u) ,  we  have  that  I  (u)  (^  I    (u  +  9*v*)  because 

9*  <  9    ;  the  coefficient  of  s.  for  such  an  i  thus  remains  equal  to  1. 
max  1 

Consider  s.  for  i  e  I(u).   If  s .  is  basic  in  (5.1),  then  by  complementary 
slackness,  v*  =  0  which  implies  i  c  l(u  +  9*v*)  and  the  objective  function 
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coefficient  remains  equal  to  zero.   In  short,  we  use  the  previously 
optimal  basis  as  a  starting  feasible  basis  for  (5.1)  at  u  +  e*v*  and 
the  objective  function  value  equals  a*. 

To  show  that  this  value  of  a*  can  be  reduced  for  (5.1)  at  u  +  9*v*, 

s  —   s      — 

note  that  9*  >  0  means  we  have  found  an  x  e  X  such  that  L(u,  x  )  >  L(u) 

=  L(u,  X  )  for  X     basic  in  the  starting  basis.   By  construction, 

L(u  +  e*v*)  =  L(u,  X  )  +  e*v*Y^  =  L(u)  +  6*a*  and  therefore  v*y^  <  a*. 

In  other  words,  the  LP  reduced  cost  relative  to  the  starting  basis  of 

the  subgradient  y'  of  L  at  u  +  6*v*  is  v*y   -  o*  <  0.   The  activity  y 

should  be  pivoted  into  the  starting  basis  and  (barring  degeneracy)  the 

objective  function  will  be  decreased. 

Case  two:   6*  =  9 

max 

This  case  is  the  same  as  case  one  except  for  the  possibility  that 

c  -  +  - 

for  some  i  e  I  (u) ,  we  have  s.  basic  and  u.  +  9*v.  =0  implying 

i  e  I(u  +  9*v).   If  this  is  so,  the  objective  function  coefficient  of 

s.  needs  to  be  changed  from  1  to  0  which  causes  a  reduction  in  the 

objective  function.  Problem  (5.1)  is  then  reoptimized. 

Case  three:   9*  =  0 

In  this  case,  we  have  found  an  s  e  T(u)  such  that  L(u,  x  )  =  L(u) 

s  s 

and  v*Y*  -  v*y   >  0  (see  (5.5)).   Thus,  v*y   <  a*,  we  have 

s  e  T(u)  -  T'(u)  and  the  non-basic  y  can  enter  the  basis  causing  the 

objective  function  to  be  reduced. 

The  primal-dual  rule  of  moving  to  the  next  breakpoint  of  the  L  function 
in  the  direction  v*  ensures  convergence  of  the  ascent  algorithm.   It  is  also 
reasonable  to  move  to  the  maximum  of  L  in  the  direction  v*,  but  then  con- 
vergence cannot  be  ensured.   This  is  a  curious  point  which  needs  more  study 
from  both  a  theoretical  and  computational  viewpoint  (see  also  Grinold  [19;  p.  452]) 


-26- 

^*      Sii"Plicial   Approximation  Algorithm  for   Dual    Probl 


em 


In  this  section,  we  present  a  search  method  for  approximating  an 
optimal  solution  to  the  dual  problem  (1.2)  by  approximating  the  optimality 
conditions  of  lemma  3.2.   The  search  uses  Scarfs  method  for  simplicial 
approximation  of  fixed  points  [30 J.   Since  Scarfs  original  work  in 
reference   30  ■      on  fixed  point  approximation,  there  have  been  a 
number  of  important  theoretical  and  computational  extensions  (Hansen  and 
Scarf  [21],  Eaves  [5,6],  Kuhn  [25]).   For  expositional  reasons,  however,  we 
will  ignore  these  extensions  here  and  merely  indicate  where  they  are 
relevant  and  necessary  for  efficient  computation. 

Our  method  draws  on  some  of  the  construction  of  Hansen  and  Scarf  for 
simplicial  approximation  in  non-linear  programming  [21;  pp.  14-18].   The 
major  conceptual  difference  between  our  method  and  theJrs  is  that  we  are 
approximating  optimal  dual  solutions  while  they  are  approximating  optimal 
primal  solutions.   Wagner  [33]  has  also  applied  simplicial  approximation 
to  nonlinear  programming. 

In  order  to  compactify  the  search,  we  must  append  a  constraint  of 
m 
the  form  J^  u.  £  U  to  the  dual  problem  which  we  will  consider  in  the  LP 

form  (cf  (3.1)) 

w(b)  =  max  w 

w  <  -  ub  +  f(x'')  +  ug(x'')      t=l T 

m  (6.1) 

Z   u  _<  U 
1=1   ^ 

u  >  0. 
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We  have  already  discussed  in  lemma  3.1  the  validity  of  adding  sucli  a 
constraint  to  the  dual  problem.   The  necessary  and  sufficient  condition 
for  optimality  of  u  in  (6.1)  from  lemma  3.1  becomes  a*  =  0  in 

a*  =  min  E   s.  +  Z      s. 

1=1  .   T-/~N 

lel  (u) 

s.t.  Z  X+Y.  -sT+s.-e=0  (6.2) 

teT(a)      ^    ^    ^ 


X     ^  0,  t  e  T(u),  s~  ^Of   s     >   0,  i=l,...,m 


where 


m 
e  =  0      if   Z  u  <  U 
i=l 


m 
6  >  0      if   Z  u.  =  U. 
i=l  ^ 


For  technical  reasons,  it  is  convenient  to  add  a  zero   component 
to  the  u  vector  and  consider  our  search  to  be  over  the  simplex 


m 
{u|  X  u  =  U}.  (6.3) 

i=0 


This  modification  holds  only  for  this  section.   In  [30],  Scarf  picks 
points  in  the  set  P  =  {u  ,u  ,...,u  }  where  u   c  S,  k  >  m  and  where 
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u°  -  (O.Mq Mq) 


u^  =  (M^.O.M^ M^) 


u"  =  (M   ...,M  0) 
ID      m 

and  M  >  M  >  ...  >  M  >  U.   In  addition,  the  u   are  chosen  so  that 
0    1  m  ' 

u   j'  u   for  any  i  and  any  k  j'  k  ,  k^  >  m  and  k  >  m.  With  the  points 

k  ^'O      ''m 

u   so  chosen.  Scarf  defines  [30;  p.  1330]  a  primitive  set  u   ,...,u   to 

k  ^  ,   ^i. 

be  one  such  that  there  is  no  u   satisfying  u  >_  min         (u   ) , 

i=0,l, . . . ,m 
i=0,...,in.   In  words,  a  primitive  set  describes  a  subsimplex  in  which  no 

other  u  lie.   From  a  computational  viewpoint,  the  simplex  S  can  be 

partitioned  into  a  regular  grid  of  subsimplices  and  moreover,  the  points 

u  can  be  generated  as  required  (see  Hansen  and  Scarf  [21],  Eaves  [5]). 

The  main  tool  in  Scarf's  simplicial  approximation  algorithm  is  the 

following  result. 

k       k 
Lemma  6.1  (Scarf  [30;  p.  1332])   Let  u   ,...,u  "*  be  a  primitive  set,  and 

k 
let  u   be  a  specific  one  of  these  vectors.   Then,  aside  from  one  excep- 

k  K 

tional  case,  there  is  a  unique  vector  u   different  from  u   and  such  that 

n         11     j-i 
u   ,...,u    ,u  ,u    ,...,u   ,  form  a  primitive  set.   The  exceptional 

K  n         m 

case  occurs  when  the  m  vectors  u   ,  i  ?^  a,  are  all  selected  from  u  ,...,u  , 
and  in  this  case  no  replacement  is  possible. 

To  apply  this  result  to  solving  approximately  (6.1),  we  need  to  apply 
another  theorem  which  extends  the  above  result. 


Theorem  6.1  (Hansen  and  Scarf  [21;  p.  3])   Let 
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A 


1' 

0 

0 

a 

0 

1 

• 

• 

• 

0 

• 

• 

• 

• 

0 

• 

\o 

0 

1 

a 

0,m+l 


in,in+l 


0,K 


m,K 


be   an    (m+l)    x    (K+1)    matrix  and  b      =    (b-,...,b    )    a    (m+1)    non-negative 

u      in 

vector,  such  that  the  k   column  of  A  is  associated  with  the  k.*"^^  vector 

u  e  P.   Assume  that  the  set  of  non-negative  vectors  y,  satisfying 

k       k 
Ay  =  b,  is  bounded.   Then  there  exists  a  primitive  set  u   ,...,u  ^   so 

that  the  columns  k-,...,k   form  a  feasible  basis  for  Ay  =  b. 
U      m 

The  theorem  is  applied  to  (6.1)  by  specifying  the  appropriate  matrix 
A  and  vector  b.   The  matrix  A  is  given  below  and  the  appropriate  u  are 
shown  above  tlieir  associated  columns. 


A  = 


0 
u 

1 

u 

m 
u 

m+l 
u 

'i 

0 

0 

1 

0 

1 

• 

mfl 
^1 

• 

0 

• 

• 

\0    0 


0 

1 


+  1 


m+l  _^  - 
m 


K 


v^l 


m 


k  k  T 

where  y   is  any  subgradient  of  L  at  u  .   The  vector  b   =  (1,...,1), 


Clearly,  the  set  {y|Ay=b,  y^O}  is  bounded  in  this  case. 
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We  sketch  the  constructive  proof  of  theorem  6.1  because  it  implies 
an  algorithm  which  we  need  to  discuss  in  some  detail.   After  the  dis- 
cussion of  the  algorithm  we  will  give  a  discussion  of  the  results  of  the 
algoritlim  and  their  interpretation  as  an  approximation  to  the  optimality 
conditions  of  lemma  3.1. 

The  algorithm  begins  with  the  set  u  ,u  ,..,,u  which  is  not  a 

primitive  set.   A  primitive  set  is  uniquely  formed  by  replacing  u  with 

k* 
the  vector  u   where 


k*        k 
Uq  =  max  Uq  . 

k>m 


We  then  take  the  feasible  basis  corresponding  to  the  first  nrfl  columns 
of  A  and  pivot  in  column  k*.   This  pivot  is  the  ordinary  simplex  pivot. 
If  the  column  of  A  corresponding  to  u   is  pivoted  out  of  the  starting 

basis  then  the  algorithm  terminates.   If  another  column  is  pivoted  out 

k*   1 
of  the  starting  basis,  say  column  k  ,  then  the  primitive  set  u   ,u  ,..., 

u   and  the  new  basis  correspond  except  for  u  and  the  zero   column  of 
A  wtiich  is  still  in  the  basis.   In  this  case,  we  remove  u   from  the  primi- 
tive set,  and  replace  it  with  a  unique  u  e  P  according  to  lemma  6.1.   If 
u   replaces  it,  then  the  algorithm  terminates.   Otherwise,  the  algorithm 

continues  by  repeating  these  same  operations. 

0   1       m 
At  an  arbitrary  iteration,  we  have  a  primitive  set  u   ,u   u 

and  a  feasible  basis  corresponding  to  the  columns  0,k  , ...,k  . 

(i)   Pivot  column  k   into  the  basis.   If  column  0  is  pivoted  out  of 

the  basis,  terminate  the  algorithm.   Otherwise,  go  to  (ii). 
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(ii)   If  column  k  was  pivoted  out  of  the  basis  in  step  (i) ,  take  u 
out  of  the  primitive  set.   If  u   comes  into  the  primitive  set,  terminate 
tiie  algorithm.   Otherwise,  repeat  (i). 

The  finiteness  of  the  algorithm  is  due  to  the  uniqueness  of  the  changes 
at  each  of  the  above  steps  and  the  finiteness  of  the  set  P  and  the  possible 
bases  of  Ay  =  b. 

Thus,  the  algorithm  implicit  in  theorem  6.1  terijiinates  with  a  primi- 

kg      k^ 
tlve  set  u   , . . . ,u   and  a  non-negative  basic  solution  of  Ay  =  b  using 

the  basis  columns  k_,...,k  of  A.   Let  s.,...,s   denote  the  variables 

0      m  0      m 

corresponding  to  the  first  m+1  columns  of  A,  and  let  X    ,  ...,X  denote 

nH"i.      K 

the  remaining  variables.   Let  s  and  X  denote  the  specific  values  of 
these  variables  in  the  terminal  solution;  we  want  to  demonstrate  that 
the  terminal  solution  approximates  the  necessary  and  sufficient  condition 
for  optimality  that  a*  =  0  in  (6.2).   The  condition  is  approximated  be- 
cause the  subgradients  in  the  desired  convex  combination  are  subgradients 
of  L  chosen  not  at  a  single  point  but  from  m+1  points  which  are  close 
together.   There  are  two  cases  to  consider. 


Case  one:   k  ii'  0  (algorithm  terminated  at  step  (i)): 

In  this  case,  we  have  s   =  0  and  row  0  of     Ay  =  b  gives  us 

K 

Z  ^v  =  1  (6.4) 

k=m+l 

k 
Recall  that  X,  >  0  for  those  u  which  are  close  together  in  the  final 

primitive  set.   For  row  i,  i=l,...,m,  we  have 
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^      k 
s.  +  Z     (y.  +  1)X,  =  1, 

^   k=m+l   ^      ^ 


and  using  (6.4),  we   have 
K 


^i  "^  ^    ^i\  "  °-  '  ^^'^^ 


k=iiH-l 


U. 


~  1 

If  s   >  0,  then  u   is  in  the  final  primitive  set  and  u.-^  ^  0,  j  =  0,...,m 

^     k~      - 
and  I  Y.A   =  -  s   <  0  is  the  appropriate  approximate  optimality  condi- 

k=nH-l  ^   ^  ^ 

tion  for  row  i.   On  the  other  hand,  if  u   is  not  in  the  final  primitive 

k- 
set,  s.  is  not  a  basic  variable  implying  s.  =  0  and  T.  y.^,  =  0  which 

"■  "■         k=m+l   "-  ^ 

is  again  the  appropriate  approximate  optimality  condition. 

Case  two:   k„  =  0  (algorithm  terminated  at  step  (ii)) 

In  this  case,  the  algorithm  has  progressed  from  the  region  of  S  where 

m  m 

E   u  !^  0  to  the  boundary  l      u.  !^  U.   By  the  definition  of  primitive 
i=l  i=l   ^ 

a 

sets,  there  must  be  some  u  ,  1  ^  iJ.  _^  m,  not  in  the  final  primitive  set 

implying  s   =  0.   Since  the  riglit  hand  side  on  row  I   is  1,  there  must  be 

K 
at  least  one  X,  >  0  and  therefore   E    X   >  0.  Me   normalize  the  non- 
^  k=m+l  ^ 

negative  weights  to  sum  to  one  by  taking 


X,  —   ,,  ,  k— m+l,.,,,K. 

k=mfl 

K    __ 
From  row  0,  we   have  Y.  X   =  1  -  "^p,  ^  0,  where  the  inequality  is  strict 

k=m+l      '^  ^ 

(i.arriiig  degeneracy)  because   s      is   laoic.      "nr   rtn.'   i,   we    have 
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K  K 

s.    +     L  Y.\    -   1  -      E  A      =   ;      >    0. 


K 
and   dividing  by      Z  X    , 

k=Tn+l 


i       -      ^     k-^  ^0 

"1^ — T    ^i  ^  ,'^,  ^i\  - 1^ =  "  (^-6) 

k=nH-l  k=m4-l 


Since  the  u   ,  j=0,l, . . . ,m,  in  the  final  primitive  set  all  satisfy 

m   k. 

Z   u^  ^  U,  it  is  appropriate  to  allov-;  the  non-negative  quantity 
i=l 


^0 


K 
k=nrl-l  ^ 


on  every  row  i,  i=i....,n,.   As  before,  s.  >  0  implies  u^  in  the  final 

primitive  set  and  the  condition   Z    yH     -  6  ^  0  is  appropriate. 

k=m+l   ^  ^ 

The  above  analysis  establishes  the  nature  of  the  approximation  in 

terms  of  approximating  optimality  conditions.   It  is  also  natural  to  ask: 

what  is  an  approximately  optimal  solution  u  in   (6.1)  ?   Since  u  °,,..,u 

are  close  together,  it  is  reasonable  to  take  any  one  of  them  or  any  convex 

combination  of  them.   The  best  choice,  however,  appears  to  be  any  optimal 

solution  in  the  linear  programming  problem 


-34- 


w(b)  =  max  w 

s.t.  w  ^  -  ub  +  f(x  )  +  ug(x^),  t  £  T 

m 

^     \1^  (6.7) 

i=l 


u  >  0 


where  T  is  the  index  set  of  the  y  corresponding  to  basic  X  variables  in 
the  terminal  basic  feasible  solution.   We  have  argued  above  that  T  contains 
at  least  one  t. 

Let  u  by  an  optimal  solution  in  (6.7)  and  compute  L(u).   It  is  easy 
to  see  that  u  is  optimal  in  the  dual  problem  (6.1)  if  w(b)  =  L(u).   This 
is  because  L(u)  <  -  ub  +  f  (x*")  +  ug(x^)  for  all  t  and  therefore  the  con- 
straints from  (6.1)  omitted  in  forming  (6.7)  are  satisfied  a  fortiori. 
In  any  case,  w(b)  >_  w(b)  and  w(b)  -  L(u)  ^  0  is  an  upper  bound  on  the 
objective  function  error  of  stopping  with  u. 

Let  us  now  consider  briefly  the  convergence  properties  of  the  simplicial 
approximation  as  the  u  ,  k  >  m  are  increased  in  number  and  the  diameter  of 
the  subsimplices  defined  by  primitive  sets  approach  zero.   For  r  =  1,2,..., 
xet  u    ,...,u     denote  the  defining  points  for  the  terminal  primitive 
set  at  the  r   partition  of  S.   As  r  goes  to  infinity,  the  diameter  of 
these  subsimplices  goes  to  zero,  and  there  is  a  subsequence  of  the  u'^^'"', 
..•,u     converging  to  a  single  point  u*.   Any  such  point  must  be  optimal 
in  (6.1).   To  see  this,  recall  that  there  is  only  a  finite  number  of  distinct 
columns  possible  for  the  system  Ay  =  b.   Thus,  there  is  some  basic  set  re- 
peated infinitely  often  in  the  converging  subsequence.   Since  the  sets  0 

^t 

defined  in  section  3  are  closed,  the  approximate  optimality  conditions  re- 
presented by  this  fixed  basic  set  become  exact  at  u*. 
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An  extremely  important  extension  which  we  have  ignored  is  the  pro- 
cedure for  Eaves  [6]  for  restarting  the  simplicial  approximation  search 
with  a  finer  grid  using  the  terminal  solution  obtained  with  the  current 
grid.   A  second  construction  which  may  be  very  important  if  simplicial 
approximation  is  to  be  computationally  feasible  is  the  variation  of  Kuhn 
in  [25]  which  allows  the  dimension  of  the  search  at  each  iteration  to  be 
limited  to  the  number  of  positive  multipliers.   Simplicial  approximation 
has  been  computationally  attractive  on  problems  of  up  to  40  dimensions. 
It  is  conceivable  that  a  simplicial  approximation  search  of  a  300  row 
integer  programming  problem  in  the  form  of  section  2.2  would  involve  no 
more  than  40  positive  multipliers  (i.e.  40  binding  constraints)  at  any 
iteration. 
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7,  Subqradient  Relaxation  Methods  for  Dual  Problem 

In  this  section  we  discuss  the  '^ery   interesting  work  of  Held  and  Karp 
[23]  and  Held  and  Karp  and  Wolfe  [24]  on  the  application  and  extension  of 
"the  relaxation  method  for  linear  inequalities"  of  Agmon  [1]  and  Motzkin 
and  Schoenberg  [27]  to  the  traveling  salesman  and  other  optimization  prob- 
lems. We  will  refer  to  this  method  as  subgradient  relaxation  to  distinguish 
it  from  the  data  and  problem  relaxation  methods  discussed  in  section  3. 
Our  discussion  in  this  section  will  be  brief  and  the  development  will  follow 
the  development  of  [23]  incorporating  an  observation  from  [24]. 

The  idea  behind  subgradient  relaxation  is  to  generate  a  sequence  {u*^} 
of  non-negative  dual  variables  by  the  rule 

u^.   =  max{u^  +  e^Y^. ,  0}     i=l,...,m, 

where  y^   is  any  subgradient  of  L  at  u*^  and  6  is  a  positive  constant  to  be 
chosen  from  a  specified  range.  There  is  no  guarantee  that  L(u   )  >  L{u  ) 
but  if  w(b)  is  known,  the  method  can  be  made  to  converge  (in  a  certain  case, 
finitely)  to  a  u  such  that  L(u)  >^  w  where  w  is  arbitrarily  close  to  w(b). 
Of  course,  w{b)  is  generally  not  known  although  it  can  often  be  closely 
estimated.  Another  point  worth  noting  is  that  in  using  the  subgradient 
relaxation  method  in  the  branch  and  bound  algorithm  of  figure  4.1,  we  can 
take  w  to  be  the  lower  bound  required  for  fathoming  the  subproblem  by  bound 
in  step  5.  In  any  case,  the  justification  for  the  subgradient  relaxation 
method  rests  with  its  computation  effectiveness.  The  method  has  proven 
effective  for  the  traveling  salesman  problem  [23]. 
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Consider  now  a  generic  step  from  the  point  u  ^  0  to  the  point  u  +  ey  +  6 
where  e  is  some  positive  number  to  be  chosen,  y  is  a  subgradient  of  L  at  u 
and  (5.  =  0  if  u.  +  ey^  ^0,  6.  =  -  u.  -  Oy^.  <  0  if  u.  +  66.  <  0,  i=l ,. . .  ,m. 
The  following  lemma  stated  without  proof  summarizes  the  results  on  page  9 
of  [23]  and  page  509  of  [24].  In  words,  it  says  that  if  a  sufficiently 
small  step  in  the  direction  of  the  subgradient  y  is  taken,  followed  by  a 
projection  onto  the  non-negative  orthant,  then  the  new  point  u  +  ey  +  6 
is  closer  to  a  maximal  u*  than  u. 

Lemma  7.1  (Held,  Karp  and  Wolfe):  Let  u  >_  0  with  dual  value  L{u)  be  given. 
Suppose  u  is  such  that  u  >^  0  and  L(u)  >  L(u).  If  6  is  chosen  to  satisfy 


0  <  9  <  ^^L^")  -  L(")) 

\\y\r 


then 

I |u  -  {u+ey+6)| I  <  I |u  -  u] 


This  lemma  indicates  the  type  of  step  size  required  for  convergence. 
Assume  w  <  w(b)  and  let  U-  denote  the  set  of  u  e  R*^  satisfying 


w  <  -  ub  +  f(x*)  +  ug(x*)     t=l,...,T 


u  >  0. 


Let  {u  }  be  a  sequence  of  points  given  by 


(7.1) 


u\''   =max{u^.p  (^^^^)y.,  0}  (7.2) 
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The  results  of  Agmon  [1]  and  Motzkin  and  Schoenberg  [27]  are  concerned  with 

convergence  properties  of  this  sequence.  Roughly  speaking,  if  p  =  2  for 

all  r,  the  sequence  always  includes  a  point  u  e  U-  whereas  if  0  <  e  <  p  <  2 

for  all  r,  the  sequence  includes  a  point  u  e  U-  or  converges  to  a  point 

on  the  boundary  of  U-. 

w 

The  advantages  of  subgradient  relaxation  over  the  primal -dual  ascent 
algorithm  of  section  5  is  the  elimination  of  computational  overhead  while 
maintaining  satisfactory  or  good  convergence  properties.  One  disadvantage 
is  the  absence  of  guaranteed  monotdnically  increasing  lower  bounds.  More- 
over, when  there  are  constraints  other  than  u  >_  0  on  the  dual  problem  (1.2) 
(e.g.,  the  IP  dual  in  [11]),  the  subgradient  relaxation  method  requires 
that  the  point  u  +  ey  be  projected  in  a  non-trivial  fashion  onto  the  set 
of  dual  feasible  solutions.  Computational  experiments  are  underway  to 
compare  and  integrate  the  primal -dual  ascent  algorithm  and  subgradient 
relaxation  method  for  a  variety  of  discrete  optimization  problems. 
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